Support Export of LTX-Video 0.9.1#1652
Conversation
likholat
left a comment
There was a problem hiding this comment.
Please add tests for tiny-random-ltx-0.9.1-video model conversion and inference.
You can refer to similar tests that already exist for the supported LTX model:
repo:huggingface/optimum-intel ltx-video path:/^tests\/openvino\//
done. Please review. |
|
@rkazants please take a look |
|
@altnnatra, please take a look |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@rkazants Tried exporting the test model and it exports, let me know if any other changes. Please rerun checks! thank you |
|
Tests cannot be collected and export fails in environments with |
@anatyrova |
|
The branch is based on an older commit of |
|
@anatyrova updated branch to latest commit, exporting model and "python -m pytest tests/openvino/test_ltx_export_config.py -v" passes huggingface_hub used is still 0.36.2 (it auto downloaded that one) Try running again! thank you! |
|
@anatyrova what version of diffusers is CI using? I think the diffusers on the CI is a bit old? and doesnt have the "timestep_scale_multiplier" |
|
i reviewed the pr for the case if it creates any regression for the prev LTX-video that we already have (m_id Lightricks/LTX-Video)
similarity results (wwb) were almost identical, both with 0.838127 (main) and 0.83812696 (pr) similarity comp. to OG model. @rkazants |
| "sana": "optimum-intel-internal-testing/tiny-random-sana", | ||
| "sana-sprint": "optimum-intel-internal-testing/tiny-random-sana-sprint", | ||
| "ltx-video": "optimum-intel-internal-testing/tiny-random-ltx-video", | ||
| "ltx-video-0.9.1": "creeper-hat/tiny-random-ltx-video-0.9.1", |
There was a problem hiding this comment.
@anatyrova, please help to upload this tiny model into optimum-intel-internal-testing
There was a problem hiding this comment.
added: optimum-intel-internal-testing/tiny-random-ltx-video-0.9.1
| self._model.pos_embed.forward = self._model.pos_embed._orig_forward | ||
|
|
||
|
|
||
| def _ltx_vae_decoder_forward(model, latent_sample, timestep=None): |
There was a problem hiding this comment.
@Yash-Vijay29, why do we need this new model patching? Why does existing patching not work?
Please try without this patch. If not possible, please leave a comment why do we need it?
There was a problem hiding this comment.
The current VAE decoder export is not enough for LTX 0.9.1-style VAEs because these models are timestep-conditioned. Diffusers calls the decoder as vae.decode(latents, timestep) / decode(z, temb=...) only when vae.config.timestep_conditioning is true. If we export it through the normal VAE decoder path, the OpenVINO graph only has latent_sample, so decode_timestep cannot be passed at runtime.
I tried running it earlier and it produced some incorrect dummy shape for the timestep and failed to export the model for the said weights.
|
|
||
| return [value] * effective_batch_size | ||
|
|
||
| def __call__(self, *args, **kwargs): |
There was a problem hiding this comment.
why do we need this call method and why existing one does not work?
By this new call method, can we affect existing LTX inference?
There was a problem hiding this comment.
why do we need this call method and why existing one does not work? By this new call method, can we affect existing LTX inference?
While testing the num_videos_per_prompt > 1 was an issue for the timestep enabled models
Diffusers LTX currently prepares latents with effective batch:
batch_size * num_videos_per_prompt
but later, for VAE timestep conditioning, it expands decode_timestep only to:
batch_size
in one of the tests it produced a result of
batch_size = 3
num_videos_per_prompt = 3
latents batch = 9
decode_noise_scale / timestep batch = 3
giving a 3 vs 9 tensor error.
I added it because i had run the currently deleted test suite on the new ltx_video-0.9.1 and it failed this test as a result.
I will narrow it down so it only applies on num_videos_per_prompt > 1 and timestep_conditioning = True
as extra precaution so it doesnt affect existing LTX inference (existing LTX inference only exists for models where timestep_conditioning is False).
| prompt_embeds = kwargs.get("prompt_embeds") | ||
| num_videos_per_prompt = kwargs.get("num_videos_per_prompt", 1) or 1 | ||
|
|
||
| if num_videos_per_prompt == 1 or not getattr(self.vae.config, "timestep_conditioning", False): |
There was a problem hiding this comment.
@rkazants added a line so it only works for timestep_conditioning = true models with num_videos_per_prompt > 1 where there was an issue. if you feel something is off do let me know!
|
@rkazants Are we ready to merge this PR? Could you please add the PR to the merge queue? |
What does this PR do?
Adds support for to successfully convert LTX-Video 0.9.1 (a timestep_conditioning enabled version of LTX Video)
to be exported to IR graph. previously timestep_conditioning was not supported in exports and caused dummy timestep_conditioning to produce tensor a mismatch.
Created tiny-random-ltx-video-0.9.1 for github CI and testing.
Added inference and conversion tests.
TESTED HF VS OPTIMUM ACCURACY:
first ran :
then
Accuracy: 0.9844858
Metrics_per_question:
metrics.csv
Before submitting